Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cherry-pick] [202205] Fix memory leak issue in ConfigDBConnector. #706

Merged
merged 1 commit into from
Nov 14, 2022

Conversation

liuh-80
Copy link
Contributor

@liuh-80 liuh-80 commented Nov 8, 2022

Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

Change DBConnector::pubsub() to return a smart pointer.

Pass all test case.

Run following code in python and validate there is no epoll and socket leak:

import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870

Co-authored-by: liuh-80 azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net

Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

```
void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.
```

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

Change DBConnector::pubsub() to return a smart pointer.

Pass all test case.

Run following code in python and validate there is no epoll and socket leak:
```
import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
```

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
@liuh-80 liuh-80 changed the title Fix memory leak issue in ConfigDBConnector. (#655) Fix memory leak issue in ConfigDBConnector. Nov 8, 2022
@liuh-80 liuh-80 changed the title Fix memory leak issue in ConfigDBConnector. [cherry-pick] [202205] Fix memory leak issue in ConfigDBConnector. Nov 8, 2022
@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 9, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 10, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 11, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80 liuh-80 marked this pull request as ready for review November 11, 2022 02:19
@liuh-80 liuh-80 requested review from qiluo-msft and yxieca November 11, 2022 02:19
@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 11, 2022

PR been blocked by this UT, however the issue seems not related with the code change in this PR:

2022-11-10T15:51:38.7247858Z test_vlan.py::TestVlan::test_VlanMemberLinkDown FAILED [ 91%]

2022-11-09T04:43:35.3530022Z test_vlan.py::TestVlan::test_VlanMemberLinkDown FAILED [ 91%]

@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 11, 2022

My another PR on 202205 also have failed on same UT few times:
https://dev.azure.com/mssonic/build/_build/results?buildId=169315&view=results

@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 11, 2022

Will create another test PR to validate the UT issue not caused by code change in this PR:
#707

@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 11, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 11, 2022

@qiluo-msft , could you please help do a force merge?
This PR blocked by following UT:
test_vlan.py::TestVlan::test_VlanMemberLinkDown FAILED [ 91%]

And I confirm the UT issue not caused by change in this PR with another PR which change nothing:
#707

Validation also failed on same UT
https://dev.azure.com/mssonic/build/_build/results?buildId=172876&view=logs&j=3f6395b2-1619-5ebe-f305-2aedcf353cb5&t=f9db2383-9349-50ba-61c3-4bde15cec59e

@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 11, 2022

The UT issue caused by a UT added by this PR: sonic-net/sonic-swss#2469

Seems the UT depends on a kernel patch which not ready on 202205:
sonic-net/sonic-linux-kernel#293

@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 12, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rlhui rlhui added the P0 label Nov 12, 2022
@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 14, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80
Copy link
Contributor Author

liuh-80 commented Nov 14, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80 liuh-80 merged commit 8fee1b4 into sonic-net:202205 Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants